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Save Time, No More Retyping! 



Congratulations on acquiring Readiris. This software package will undoubt- 
edly be of great help in recapturing your texts, tables, graphics and business 
cards. 

As efficient as computers are, you have to key in your information first. If 
you have ever retyped a 15 page report or a large table of figures, you know how 
tedious and time-consuming it can be. Use this state-of-the-art OCR package to 
automatically enter text in your applications and you'll acquire an unprecedented 
level of efficiency and comfort! 

Scan a printed or typed document, indicate the zones of interest - or have the 
system detect them for you -, execute the character recognition and export the 
document to your wordprocessor. Documents composed of many pages are pro- 
cessed from start to finish in a single effort. A few mouse clicks beat long hours 
of work as Readiris converts your paper and PDF documents into editable com- 
puter files: it's up to 40 times faster than manual retyping! 

With the automatic mode of operation, the user's effort is reduced to a single 
click: he initiates the scanning and saves the text result, all intermediate steps are 
taken care of by Readiris. After the recognition, you can send the reading results 
directly to your favorite applications - be that a wordprocessor, spreadsheet or 
web browser. 

Readiris recognizes tabular data and recreates them as worksheets or as table 
objects inside your wordprocessor; your numeric data are immediately ready for 
further processing. 

Based on the Connectionist technology from I.R.I.S., Readiris represents the 
best OCR has to offer. Font-independant feature extraction is complemented by 
self-learning techniques derived from a proprietary neural network. The system 
can learn new characters through context analysis: linguistic knowledge about 
syllables and words improves the OCR performance. 

Readiris supports up to 104 languages: all American and European languages 
are supported, including the Central-European languages, the Baltic languages, 
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Greek and the Cyrillic ("Russian") languages. (Optionally, you can read four 
Asian languages - Japanese, Simplified and Traditional Chinese and Korean.) 
Readiris even copes with mixed alphabets: the software detects "Western" words 
that pop up in Greek, Cyrillic and Asian documents - many untranscrible proper 
names, brand names etc. are written using the Western symbols. 

Readiris uses linguistics duringths recognition phase, not after it. As a direct 
result, Readiris recognizes documents of all kinds with top accuracy, including 
low-quality documents, faxes and dot matrix printouts. It copes beautifully with 
badly scanned and copied documents containing too light or dark font shapes. 
Joined characters ("ligatures") are resolved and fragmented forms, such as dot 
matrix symbols, are recomposed. 

User verification in pop-up style not only flags doubtful characters but also 
increases the system's precision. All solutions confirmed by the user are memo- 
rized, increasing speed and confidence as you go along. Using Readiris means 
rendering it more intelligent each time! This powerful learning tool allows you 
to train Readiris on special characters such as mathematic symbols and dingbats 
but also to handle distorted fonts as you will find in real documents. 

To increase your productivity further, Readiris not only recognizes your texts, 
but can fonnatthem for you as well! Make use of "autoformatting" and Readiris 
recreates a facsimile copy of the scanned document: the word, paragraph and 
page formatting of the original document are retained. 

Similar typefaces are used, the point sizes and typestyles as used in the source 
document are maintained across the recognition. The placement of columns, text 
blocks and graphics follows your original documents. And as Readiris supports 
greyscale and color scanning effortlessly, you can recapture any graphics - be 
they lineart, black-and-white photos or color illustrations. When a document 
contains tables, Readiris reorganizes them in real cells and recreates the cell 
borders of the original tables. 

In other words, Readiris allows you to archive a true copy of your documents, 
be it editable and compact text files instead of scanned images ! Various levels of 
formatting are available, the choice is up to the user. 



V 



User 's Guide 



Readiris supports virtually all scanners using their Photoshop "plug-in" or 
Twain drivers: all models that dispose of a Photoshop "plug-in" or Twain mod- 
ule are seamlessly supported. 
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the copyrights to the Readiris software, the OCR technology, the BCR technol- 
ogy, this manual and the on-line help. 

AutoFormat, Connectionist, Linguistic technology, the IBCR-II, the I.R.I. S. 
logo and Readiris are trademarks of I.R.I. S. 

Acrobat and Reader are (registered) trademarks of Adobe. Apple, AppleWorks, 
Mac OS and Safari are (registered) trademarks of Apple. Entourage, Excel, Internet 
Explorer and Word are (registered) trademarks of Microsoft. 
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Chapter 1 

Installation 

This chapter discusses the system requirements and installation of the Readiris 
software. 

System Requirements 



This is the minimal system configuration required to use Readiris on a com- 
puter equipped with the operating system Mac OS X: 

□ a Mac OS computer with a G3 processor. 

□ the operating system Mac OS X version 10.01. Version 10.2.x is rec- 
ommended. 

□ 110 MB of free hard disk space. 

This is the minimal system configuration required to use Readiris on a com- 
puter equipped with the operating system Mac OS 9.x: 

□ a Mac OS computer with a PowerPC processor. Readiris does notnm 
on 680x0 processor-based computers! 

□ the operating system Mac OS 9.x. The system libraries QuickTime 4.0 
and CarbonLib 1.4 or later are required. (If necessary, CarbonLib 1.5 
will be installed by the Readiris installer.) 

□ 32 MB free RAM. 

□ 110 MB of free hard disk space. 



Installing the Readiris Software 



The Readiris software is delivered compressed. To install, it is mandatory to 
run the installation program. 

1 . When booting your computer, select the appropriate Startup Disk. 

If you are running the operating system Mac OS X on your computer, 
launch the Readiris installer under Mac OS X: doing so will install the 
necessary files to run Readiris as "native" software under Mac OS X 
and under Mac OS 9.x. 

The reverse does not hold: when the installer is run under Mac OS 9.x, 
you install the software under Mac OS 9.x, but notxmder Mac OS X - 
even if this system is present on your hard disk! 

2. Insert the Readiris CD-ROM. 

3 . Double-click on the Readiris installer and follow the on-screen instruc- 
tions. 

You are recommended to use the "easy" installation - it places all the 
necessary files on your hard disk, including the sample images which 
are used in the tutorial of this manual. 
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Readiris Installer 



f Easy Install 



US 



^ Read Me . > 



Click the Install button to install 
■ Application 
• Sample Images 



Install Location - 



The folder "Reed iris 1 "" will be created in the folder 
"Applications" cn the dirk " Mac" 



Install Location : 



Mac 



f Quit ^ 
f Install > 



The Readiris folder is created automatically by the installation program under 
the "Applications" folder. 
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Computer Home Favorites Applications 



1 of 6 items selected, 2.27 GB available 



Read iris 




Images 



Read Me 



Uier's Manual 



Installing Software Options 



There's a single software option available for the Readiris software: the "Asian 
OCR add-on". It allows you to read Japanese, Traditional Chinese, Simplified 
Chinese and Korean. 
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By installing this option, specific documentation becomes available that dis- 
cusses how you can recognize Asian documents. 
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Computer Home Favorites Applications 



1 of 6 items selected, 2 27 CB available 
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Images 



Read Me 




Uier'i Manual Reading Asian languages 



Uninstalling the Readiris Software 



Uninstalling the Readiris software is very easy: run the installer again, select 
the installation option "Uninstall" and click the "Uninstall" button. (The same goes 
for the software options: run the "uninstaller" of these specific software options 
to erase them!) 
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Readiris Instalter 



j ✓ Easy Install 'T 1 
Custom Install 



install 



• Sample Images 



Read Me... 



Install Location - 



The folder "Readiris™" will be created in the folder 
" Applications" en the dirk " Mac" 



Install Location : 



Mac 



ID 



f Install ~^ 



Register to Vote! 



We invite you to register your Readiris licence by submitting a registration 
form on the I.R.I. S. web site - this method obviously requires an Internet connec- 
tion! You can access the registration form with the command "Register Readiris" 
under the "Help" menu. 

■. JLUI.L I ll.l.^ 

You can register in many ways, not just via the web: by faxing or sending in 
your registration card and by calling I.R.I. S. during working hours. 
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1 

Contacting I.R.I.S. 




To aet product support, vou can contact I.R.LS, bv e-mail at the address 5upportCaJirislink.com. 




Please describe the phenomenon you experience clearly and include all relevant data concerning 
Readiris and your computer system, 




The "About Readiris'' dialog under the "Readiris'' menu and the command "'I.R.I.S. on the Internet'' 
under the "Help" menu oive direct access to the I.R.I.S. home paqe (www.irislink.com 1 }. 




I.R.LS. 

Image Recognition Integrated Systems 
Rue du Bosquet 10, 
1348 Louvain4a-Neuve 
Belgium 




Tel: 32-10-45 13 64 

rdX. Ji-iU-H-D *+■ J 




I.R.I.S. Inc. 

Image Recognition Integrated Systems 
Delray Office Plaza 

4731 West Atlantic Avenue Suite B1-B2 

Delray Beach, FL 3344S 

USA 




Tel: 1-561-921-0847/ 800-447-4744 
Fax: 1-561-921-0854 




E-mail info: infotfDi rislink.com 
E-mail sales: salesfcC'irislink.com 
E-mail support: suDDQrt'fr'i rislink.com 




I.R.I.S. home paae: http://www.irislink.com 
Readiris web site: http://www.readiris.com 
On-line shop: http://shoD. irislink.com 


i 

1 


JfiRocal machine zone 1 ^ 



Registering your Readiris licence allows us to keep you informed of future 
product developments and related I.R.I.S. products. The registration benefits, 
including free product support and special offers, are strictly limited to reg- 
istered users. 
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Comfort Isn't Laziness! 



Some additional steps can be completed for maximal ease of use of Readiris. 

On a Mac OS X system, drag the Readiris application to the dock to make it 
available at all times. (You can drag the application away from the dock to re- 
move it again.) Also know that the dock is personal: each user that logs on to a 
machine may have his own set of applications on the dock! 



Readiris 









Under Mac OS 9.x, it may be useful to create an alias. (Use the command 
"Make Alias" of the Finder's "File" menu to do so.) As a result, you'll be able to 
start the Readiris software directly from your desktop. Also, you can add Readiris 
to the folder "Apple Menu Items". The software documentation that came with 
your Macintosh can tell you more about aliases and the Apple menu. 



Installing Your Scanner under Readiris 



Readiris exploits the Photoshop "plug-in" or Twain driver of each scan- 
ner to support it. In other words, as soon as there's a Photoshop "plug-in" or 
Twain driver available for your scanner model, Readiris supports it effortlessly! 

Under Mac OS X, use the "carbonized" Photoshop "plug-in" or Twain driver 
or the "native" Photoshop "plug-in". Under Mac OS 9.x, the "normal" or "car- 
bonized" Photoshop plug -in or the Twain driver must be installed. 

Here's how you install your scanner under Readiris. 

Using the Photoshop "plug-in" 

1. Install the scanner drivers using the CD-ROM that comes with your 
scanner. Doing so will install the Photoshop "plug-in" on your com- 
puter. (If necessary, study the installation instructions that accompany 
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your scanner carefully to ensure that these drivers are installed prop- 
erly.) 

2. Verify if the scanner operates correctly with any scanning application 
other than Readiris. 

3. Locate the Photoshop "plug-in" on your hard disk and copy it to the 
your system's "Application Support" folder. 

4. Start up the Readiris software. 

5. Select your "plug-in" under Readiris with the option "Scanner" in the 
"Preferences" command under the "Readiris" menu. That shouldn't be 
too hard: your Photoshop "plug-in" will be the only scanner driver avail- 
able under the "Scanner" option. 

* * 

Preferences 

Scanner 

[ ScanWise Plugin ... |-T"j Hi! Invert Image 
3 Digital camera 

Using the Twain driver 

1. Install the scanner drivers using the CD-ROM that comes with your 
scanner. Doing so will install the Twain driver on your computer. (If 
necessary, study the installation instructions that accompany your scan- 
ner carefully to ensure that these drivers are installed properly.) 

2. Verify if the scanner operates correctly with any scanning application 
other than Readiris. 

3. Start up the Readiris software. 

4. Select your scanner model under Readiris with the option "Scanner" in 
the "Preferences" command under the "Readiris" menu. 
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Preferences 

Scanner 

[ EPSON TWAIN HH □ lnvert lma 9 e 
Digital camera 

More about scanner support can be found in the "Read Me" file that comes 
with the Readiris software. 

Don't hesitate to contact your scanner manufacturer or its representative should 
there be problems with scanner drivers. Most manufacturers allow you to down- 
load the latest versions of the scanners drivers from their web site. 

Getting Product Support 



The Readiris "Read Me" file details how you can get technical support. 
Among other things, you can contact I.R.I. S. by e-mail at the address 
support@irislink. com. 

Please describe the phenomenon you experience clearly and include all rel- 
evant data concerning Readiris, your scanner and your computer system. 

Getting in Touch with I.R.I.S. 



You can also contact I.R.I.S. to learn more about its range of software solu- 
tions. 

The Readiris startup screen and the command "I.R.I.S. on the Internet" under 
the "Help" menu of Readiris bring you directly to the I.R.I.S. home page 
(www. irislink . com) . 
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Chapter 2 

Guided Tour 

Readiris is a state-of-the-art OCR package equipped with numerous advanced 
features. We will discuss all major features in this chapter and add many tips and 
hints concerning the use of Readiris. 

Starting the Software up 



Double-click on the Readiris application in the Readiris folder (under "Appli- 
cations") or click the application icon on the dock. (On a computer running Mac 
OS 9.x, you can double-click the alias for the Readiris application on your desk- 
top.) 



6 0 6 L- Readiris CD ' 

e 2 fwwTM)\ ^ ft v a » 

Back Forward View Computer Home Favorites Applications 


Readiris^ Images Read Me 

^ Adobe 
User's Manual 





Read Iris 


14 * \- 
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The Readiris startup screen and the menu bar of the Readiris software are 
displayed. The startup screen displays the version and copyrights of the Readiris 
software. It also gives direct access to the I.R.I. S. homepage - simply click on 
the URL www.irislink.com to visit the I.R.I.S. web site. 



Readiris proQ 




Discovering the Readiris Interface 



The Readiris application not only contains a menu bar but also an image 
window and several toolbars that give quick access to the most frequent com- 
mands. 
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The vertical main toolbar gives quick access to all frequent general com- 
mands, the horizontal image toolbar contains all common commands you need 
during the image preview. 



6 0 0 



Untitled 



Acquire 



Scarlet... 



Engl is hi 



tit 



% a D & r <k 



To learn which command corresponds to a certain button, hold your mouse 
pointer over it for a while: the status bar of the image window will tell you what 
the button does. (The window pane or image zone is where the scanned images 
are displayed.) 
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Untitled 



Acquire 



Scanjet... 

Text 



Acquire a document with your scanner 



The status bar also displays all system information and gives information on 
the current image - the image size (in image pixels and in KB) and the image 
resolution. (When the image window is too small, some information may not be 
visible.) 

Recognizing "AuLofu 



Autoform (Page 1 of 1) 



Open a document from a file 



9 jQ a. a. t a, 

2110x2615x24 5434K 300dpl 



Getting Started with a First Tutorial 



The best way to become familiar with the operation of Readiris is undoubtedly 
by using it. A number of prescanned images is provided with the software; 
they allow you to get started even when there is no scanner connected to your 
computer. Let's turn to them now. 

Readiris allows you to scan images using your scanner and open prescanned 
images: select "File" as image source and use the "Open" button to open prescanned 
images, select your scanner as image source and use the "Acquire" button to 
acquire images with your scanner. (You can also set the image source with the 
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"Preferences" command under the "Readiris" menu and you can acquire images 
with the commands "Open Document" and "Acquire Document" under the "File" 
menu.) 



O j 






Acquk* Open 



Color, greyscale and black-and-white images are supported on an equal basis: 
Readiris allows you to open FlashPix images, GIF images, JPEG images, MacPaint 
images, Photoshop images, PICT images, PNG images, QuickDraw GX images, 
QuickTime images, Silicon Graphics images, Targa images, (uncompressed, 
packbits and Group 3 compressed) TIFF images, multipage TIFF images and 
Windows bitmaps (BMP). (Readiris also opens Adobe Acrobat PDF documents.) 

Loading prescanned images is particularly useful to convert your faxes into 
editable text files. 

Select your scanner as image source, click the "Open" button and go to the 
folder "Images" under the Readiris folder. 
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Open Document 



from: Images 



§] Alphabets.tif 
Autoform.jpg 
B raziliiin.jpg 
Czech.jpg 
Deskew.jpg 
Digital.jpg 
Duteh.jpg 
English jpg 
French.jpg 





Kind: Document 
Size: 824 KB 
Created: 11/14/01 
Modified: 11/14/01 



Co to: 



f Add to Favorites ^ 



f Cancel ^ f Open ^ 



Double-click the image English.jpg in the image folder or click the image once 
and click the button "Open". The image is read from disk and displayed in the 
image zone. 
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English (Page I of 1) 



Acquire 



English 
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A word about OCR 

The tlmof OCR is to automatically enter printed text documents in a very effective and 
low cost way. Although the first research and deve Lop merit on Optical Character 
Kecngjiition {t30RJ bepan more than Mi years ago, Shis technology is still unknown by 
most of Ltie people who could use it for their document entry appflcaticfftt- 

fvaw, ypu can use this effective tnnd in your nSiio? and unburden yourself with the 
fastidious task of retyping printed text. OCR is the merit efficient and fastest tool to enter 
texts into your computer automatically. 




The document is read by your scanner. This device acts as the r eye" of your computer 
and sends, it the image. At this step, the document image is only a meaningless dolfcd Of 
black ppiiiitj. pixels, on a whit* background. The OCR software hits to extract text 
information from these pixels: it has to recognize shapes by rationing dharacters. 

The system extensively uses ]]iij>ujs-ti'-- d;i totm^ when analysing the contest, in this way 
finding correct Evolutions for difficult cases. The user Hairs the software on iww 
characters and typ&Blylss,, which are recognized automatically late' on, This learning 
module allows you lo read virtually any font. In Other words; the software gets more 

intelligent each time you use W. 



t.'"|!iVi(|.'!i'. Irrjj'i; K i; l ■. : f- 1: 1 1 
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For every greyscale and color image, a black-and-white version is generated 
for the OCR process. 

Converting page 1 . 



To display a greyscale or color image as black-and-white, disable the option 
"Image in Color" under the "View" menu. 

There's another way to import image files into Readiris. Drop them on the 
Readiris icon: Readiris starts up and the image file is opened automatically. 



Read iris 



Computer Home Fa 
I of 4 items selected, 2.0 



1^ It 

Readirij ReadMe.htm 

The image toolbar contains all the commands you need during the image pre- 
view: tools to analyze the page, to indicate the zones of interest, to rotate the 
image etc. 

Zooming in on Images 



Readiris has several commands that allow you to zoom in on the scanned 
image, for instance to verify the scanning quality. 

Click the "Zoom Level" button on the image toolbar (or go the "View" menu) 
to discover the zoom levels: you can zoom in at real size, display the image at 
50% and 200% of its actual size, fit the image to the page width and to fit the 
entire image in the preview window. At actual size, a screen pixel corresponds to 
an image pixel. (Shortcuts are available for all zoom levels!) 



Back Forward V e\v 
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i ✓ Fit to Window ;46K 30Mpi 

= Fit to Width 

50% Actual Size 
Actual Size 

f enter printed tcx -,„_.. . ^ id 
I research and. d 200% Actual Size „ 
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it few their Hcvnment entry applications*- 

Note that the current zoom level is indicated in the window title - there's no 
zoom level mentioned when the image fits the window or the page width. 

English English & 50% 



English @> 100% English <3> 200% 

You can also Command-click the mouse button over a region of the scanned 
image to zoom in at real size immediately. Command-click a second time to zoom 
out again. As soon as you press the Command key over the image preview, the 
mouse cursor is adapted! 
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A word about OCR 



Theaimof OCRks.h.uiiiMirri:istic.:sliy enter printed taa 
law ca=.t way. Although thi: first research and c 
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Now, yov fan u?e> this eFfective tnnJ in your oliic 
fastidious task of retyping printed text OCR is the rr 
texts into vour computer automatically. 
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The aim of OCR is to 
low cost way. Althi 
Recognition (OCR) b 
most of the people w 



Finally, the magnifying glass allows you to zoom in on specific details of the 
acquired images. Click the button "Magnifying Glass" on the image toolbar (or 
Shift-click) and drag the mouse across the image. 
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One, Decomposing a Scanned Image 



Now that the image is scanned, you have to indicate which parts you want to 
convert into editable text by drawing frames, so-called "windows", around the 
zones of interest. 

Actually, Readiris will do this for you automatically when the option "Page 
Analysis" under the "Options" button (or under the "Layout" menu) is enabled. 
The page analysis is enabled by default. 



To force Readiris to decompose the current page - because you disabled page 
analysis by accident, because you erased some windows erroneously and want 
to redo the page analysis etc. -, you can simply click the button "Analyze Page" 
on the image toolbar (or click the command "Analyze Page" under the "Process" 
menu). 



Analyze Page 



Select the document language before executing the page analysis when you 
are dealing with Asian documents. Specific routines are used for these languages: 
the interline spacing of Asian documents is in most cases bigger than in Western 
documents, the text is made up of small icons ("ideograms") that could easily be 
seen as graphic zones in Western documents and the text may run from top to 
bottom, from right to left. And if you forgot to select the proper language, select 
it afterwards. Readiris re-executes the page analysis automatically! 

Automatic page decomposition is particularly useful when columnized texts 
and documents with a complex page layout, possibly including graphics and tables, 
are recognized. 

Page decomposition uses three window types: text, graphic and table win- 
dows. Readiris discriminates text blocks, tables and graphic zones containing 
photos, illustrations etc. on the page. (Saving graphics and recognizing tables will 
be discussed at great length below.) A specific icon marks each zone type. 



| MB | | M | ffcai 



Also note that you can Ctrl-click a zone to change its type (and to delete it)! 
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TSel Graphic Zone 
Table Zone 
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^ Delete Zone 
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Page analyisis is fast, skew-tolerant and highly accurate: it traces complex, 
'irregular" shapes. 



DOWNSIZED 
UlSPLAV: Sceptre'sl 
\CD-U12TLCDpand\ 
\is so svelte that you can tuck it inl 

The page analysis will even detect zones where you get white text on a 
black background. Recognizing such inserts is no problem: while the preview 
displays the scanned document correctly on-screen, Readiris "inverts" the image 
when the need arises to recognize such text blocks! 

Some documents have many "stray" dots on the page, may generate a black 
page border around the actual image etc. To erase all small windows - it's as- 
sumed they don't contain any text - and re-sort the remaining zones, you can 
click the command "Delete Small Zones" under the "Layout" menu. 



Delete Small Zones 



One and a Half, Sorting Windows 



Readiris not only detects the various blocks, but also sorts them: the zones are 
sorted top-down, left to right by default to cope with columnized documents. 
Numbers indicate the sort order. 
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The document is read by your scanner. This device ails as the 'eye" ol your computer 
and sends it the image. At this step, the tjtx > _Vnt image is only a meaningless cloud ul 
black points, pixels, on a while bifk^P-ulXJ The OCR softtvAre has to esrract text 
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finding correct solutions for difficult cas^s-The user trains the software on new 
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Evidently, you can modify the sort order. To do so, click the "Sort" button (or 
use the command "Sort Zones" under the "Layout" menu). 
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The mouse cursor changes as soon as the "sort mode" is enabled. 

Click on the windows you want to include. Windows you do not click on are 
simply ignored, excluded from recognition. It's easy to see which zones are se- 
lected and which aren"t: the selected windows are numbered, the non-selected 
windows aren't. 
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lne document is read by your scanner. This device acts as the r eye h t>t your .computer 
and sends, it the image. At this step, the document image is only a meaningless cloud oE 
black points, pixels, on a white background. The OCR software has to extract tost 
informftlfcin from these pjtfelS: it has tr> recognize Shapes by ^ssiKTiirm characters. 
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Two, Windowing a Scanned Image Manually 

Page analysis is the automatic way of zoning a scanned page. Alternatively 
you can zone an image manually with the windowing tools of Readiris. These 
are available on the image toolbar and under the "Layout" menu. 

j^J p| Draw Graphic Zones 

T T T Draw Table Zones 

To draw a rectangle around a zone of interest, select the corresponding tool 
in the image toolbar (or under the "Layout" menu), click the cursor in the upper 
left corner of the window, stretch the window by moving the mouse to the lower 
right corner and click again. (Sides smaller than 1 mm are not allowed, they 
wouldn't even contain a single character anyway.) 

The windows are automatically sorted in the order of creation: numbers indi- 
cate the sort order. The status bar of the image window tells you how many 
zones of each type were created. 

1 1 t«xt zene-(s) - 1 graphic zone(s) - 1 table zanefs) 

You can also frame '"irregular" text blocks by drawing polygonal windows 
around them. Non-rectangular windows are created by merging rectangular zones: 
as soon as two rectangles (of the same type) intersect, they become a single 
window automatically! In a way, you're building a house by adding one room 
after the other... (Creating polygonal table windows doesn't make any sense.) 
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Furthermore, manual zoning can be combined with window sorting: you can 
draw new windows even when the "sort mode" is enabled. You then use sorting 
to include a number of detected windows and manually create some other win- 
dows where the page analysis didn't yield the appropriate results. As soon as you 
start creating windows in the "sort mode", all windows you didn't select are 
promptly erased! 



To modify, move and delete windows, you need to select them first. To do so, 
choose the window selection tool in the image toolbar and click inside a window. 
Rectangular markers now appear at each corner and in the middle of the window 
sides. 

To unselect windows, click the mouse button elsewhere. To select addi- 
tional windows, hold down the Shift key while clicking on these extra windows. 

So much for selecting zones. To modify a window, select it, put your mouse 
cursor over a marker and drag the side to change the window size. 

To move a window, simply select it and drag it to another location. 

To delete windows, select the window(s) and choose the "Cut" or "Clear" 
command from the "Edit" menu. The "Cut" command cuts the window(s) to an 
internal buffer, "Clear" erases the window(s) irretrievably. When you paste zones, 
they are inserted in their original position, and you have to drag them to their new 
location. 

In fact, a/Zfamiliar commands from the "Edit" menu apply to the windows: you 
can delete, cut, copy and paste them! The "Undo" command also applies: if you 
have unfortunately deleted, moved, resized etc. some zones, "Undo" will cancel 
the last operation. 



Undo 










Copy 
Paste 




Gear 
Select All 





2- 19 



User 's Guide 



Also note that shortcuts are available for all commands! Let's give an ex- 
ample: to erase all existing windows, you can choose the command "Select All" or 
its shortcut Command-A and click the command "Clear" or its shortcut Backspace. 
Alternatively, you can use the command "Delete All Zones" under the "Layout" 
menu to erase all windows simultaneously. 



Delete All Zones 



w 

You are now ready to recreate the necessary layout. To restore the previous 
layout, you can choose "Undo" or the shortcut Command-Z. Or click "Undo" 
once more to erase the windows a second time... 



Three, Saving Windowing Templates 



The resulting windowing layouts can be saved as zoning templates for 
future use with the command "Save As" under the "Layout" menu and loaded into 
memory with the command "Open" under the "Layout" menu. (There's a specific 
command to allow you to quickly save the current layout again!) 

Save "Sample Layout" 
Save As. . 

If you have to recognize documents with a similar layout, for instance a 50 
page report where the header and footer should be excluded for obvious reasons, 
a single template can be applied to zone all 50 pages. 

When you load a template into memory, the page analysis is disabled auto- 
matically. The zoning template remains active until you re-enable the page analy- 
sis. 

Actually, there's a nice alternative for zoning templates: the preview tool "Ig- 
nore Exterior Area" limits the page decomposition to the "cropped" portion of the 
image. 



Select this tool and frame the portion of the image you want to process. When 
you're dealing with a multipage document, you can exclude the same outer zone 
from page analysis on every page. (Re-execute the page analysis to cancel the 
image "cropping", or change the zones manually.) 
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Readiris Takes You around the World 



Assuming that the windows are correctly defined, you are now almost ready 
to execute the character recognition. We say "almost", because we haven't veri- 
fied the language and document settings yet. 



The language setting can be found on the main toolbar. 




Click the option "Other" to display the long list of languages that were not 
selected recently. 



2-23 



User 's Guide 



Choose a language. 



G 



Afrikaans 
Albanian 
Aymara 
Balfnese 
Basque 
Bemba 
Bikol 
Bislama 
Brazilian 
Breton 

British English 
[Bulgarian 
Bulgarian-English 
Byeiorussian 
Byelorussian-English a 
[Catalan [? 



( Cancer f OK > 



Readiris is far from limited to English: up to 1 04 languages are supported! All 
American and European languages are supported, including the Central-Euro- 
pean languages, Greek, Turkish, the Cyrillic ("Russian") and the Baltic languages. 

Optionally, you can read Asian documents: the extra module "Asian OCR 
add-on" offers recognition of Japanese, Simplified Chinese, Traditional Chinese 
and Korean. (Simplified Chinese is used on China's mainland and in Singapore, 
where Traditional Chinese is used by Hong Kong, Taiwan, Macau and the over- 
seas Chinese communities.) 

Also note that the British and American - or should we say "international"? - 
variants of the English language are distinguished. 



Selecting the proper document language is imperative. Based on the selection 
of a language, the software knows which symbol set to recognize. Multi-lin- 
guistic support ensures that "exotic" characters such as c, B, n, y and 0 are 
recognized correctly. 

Secondly, the software extensively uses linguistic databases to validate its 
results. Suppose that you have to read the word "president" where an ink stain 
makes the "r" look like an "f '. Looking things up in the English lexicon, Readiris 
will detect autonomously that the word "president" is being read and that it doesn't 
make any sense to recognize the symbol "f" . This "self-learning" technique is 
of course highly dependant on the linguistic context. 

Linguistics offer useful help to solve ambiguous cases such as an "O" which 
might be mistaken for a '0'. Another typical example is the letter "1" and number 
' 1 ' which have an identical form in many fonts - think of texts produced on old 
typewriters! The linguistic context helps to determine whether you are dealing 
withT orT. 

The illustration below shows various shapes of '1' and "1". The shapes on the 
first line are unambiguous, the shapes on the second line are ambiguous, but 
linguistics can solve them. When the context does not suffice, the user inter- 
venes. 

193 1950S, ihr 

Wefl, Rossellini 

Readiris Changes Languages As Needed 



But the buck doesn't stop here: Readiris can switch languages in the middle of 
a sentence without any help from the user! When Western words pop up in 
Greek, Cyrillic or Asian documents - many untranscrible proper names, brand 
names etc. are written using the familiar Western symbols -, Readiris can switch 
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to the correct alphabet automatically. In other words, you can activate a mixed 
alphabet of Greek, Cyrillic or Asian and Western characters. 

Be sure to select "Greek-English" or the appropriate Cyrillic language setting, 
for instance "Byelorussian-English". In other words: don't try to just select "Greek" 
or "Byelorussian" as document language and hope that the Western symbols will 
come out fine! 

Creek... Russian... 

Here's an example where a Russian text contains some English words - open 
the image file Alphabets.tif if you want to try it for yourself! 





Alphabets @ SOSfi (Page 1 of 1) 
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^Russian - English*^ 

npeflHa3HaHeHHeM chctcmw OntH^ecKoro 
Pacno3HaBaHHH 3HaKOB flBJiaercii 

aBTOMaTHHeCKHH BBOfl nCH3.THHX 
AOKyMeHTOB B naMHTb KOMIlLBOTepa KpaHHe 
-Hi) (J>eKTMBHBJ_M H ZieinSBhlM IiyTCM. 

HeCMOTpa iia to, hto pa3pa6oxKa 3toh 
CHCTeMti (OCR) GfiUia npe^npHHara eme 
20 nerr iftsaju.^i^TexHonorHa eiqe noica 
HCH3BCCTHa uiHpoKOH nyGjiHKe AJXH 
aBTOMariroecKoro BBOfla MarepHajia h 

flOKyMeHTOB. 

I 



To mix other languages, simply select the language with the most extended 
character set. If you have a document where the, say, French translation is placed 
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alongside an English text, you have to select French as language to ensure that 
the accentuated characters such as c, e and u get recognized correctly. 

Defining the Document Characteristics 



Now that the language is set, we'll turn to the other document characteristics. 
You can fine-tune the recognition by specifying some document features: the font 
type and character pitch. (These commands do not apply to Asian documents.) 
Let's clarify what this means. 

Let's start with the command "Font Type" under the "Settings" menu. The font 
modes separate "normal" documents from dot matrix printed documents. "Draft" 
or "9 pin" dot matrix symbols are made up of isolated, separate dots, and highly 
specialized recognition routines are used to recognize them. 

ape-descended life 

"'Letter quality" dot matrix printing, also called "'25 pin" or "NLQ" dot matrix, 
requires the "normal" setting, as do the printing qualities typeset, typewritten, 
laser printed and inkjet printed. 



Font Type ► 


V Automatic 




Dot Matrix 



The setting "Automatic" means that Readiris will detect the font mode auto- 
matically. Let Readiris "'auto-detect" the font mode in all cases - unless you are 
sure dot matrix documents are being read! (Obviously, "Automatic" is the default 
value.) 

The tooltip of the "Recognize" button indicates the selected font type - auto- 
matic detection or dot matrix. 

Recognize the document - dot matrix font 

Recognize the document * font detection 

The character pitch can be set with the command "Character Pitch" under 
the "Settings" menu. 



Character Pitch ► ✓ Automatic 



Fixed ^ 
Proportional 

With fixed or "monospaced" fonts, all symbols of the font have the same 
width. An "i" takes up as much horizontal space on a line as a 
"w", as is the case in this sentence. Think of documents produced 
using a typewriter, where the carriage moves a fixed distance for each typed 
symbol. 

A proportional 'pitch means that the width of a character depends on its shape. 

Symbols like "m" and "w" are wider, take more horizontal space on a line than the 
"thin" characters "1" or "j". Virtually all books, magazines and newspapers are 
printed in proportional pitch. 

The simplest solution is to leave this option at all times on the default value 
"Automatic", which means that Readiris will detect the character pitch automati- 
cally. 



Readiris Gets More Intelligent Each Time! 

When the document language is selected and document characteristics are 
set, you can click the "Recognize" button on the main toolbar (or the command 
"Recognize Document" under the "Process" menu). 



Recognize 



Recognize Document 



The OCR progress is indicated on-screen. You can click the Escape key to 
abort the text recognition. 

RflCiifjnlZjrtCp L AuLt>(urrYi" 
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Readiris will enter the interactive learning phase at the end of the recognition 
when the learning is enabled. Interactive learning is enabled by default. 

Font training can substantially enhance the accuracy of the recognition sys- 
tem. When the user tries to read distorted, defaced forms as are found in real 
documents or stylized font shapes which Readiris does not recognize optimally, 
training can overcome this temporary "failure". 

User learning is also used to train the system on special symbols which 
Readiris is unable to recognize, such as mathematical and scientific symbols and 
dingbats. Some examples: Readiris can be trained to recognize the 'V symbol as 
"pi" or the dingbat as "Tel". (However, the list of recognized symbols cannot 
be extended with the symbols 'Vand "©"!) 

The interactive learning is enabled with the "Learn" button on the main toolbar 
(or with the option "Interactive Learning" under the "Learning" menu). 

(Interactive learning does not apply to Asian documents: learning does not 
make sense for these languages which use thousands of different symbols - and 
you'd have to be able to enter the ideograms, not an easy task when using a 
Western keyboard!) 

At the end of the recognition, Readiris displays the recognized text progres- 
sively and the system stops on doubtful characters, or - if you are dealing with 
touching characters ("ligatures") - on doubtful character strings. They are al- 
ways presented in their context, the doubtful characters are highlighted. 



Dictionary: Untitled 



A word about OCR The aim of OCR is to automatically 
enter printed text document in a very effective and low 
cost ~y. Although the first research and development 
on Optical Character Recognition (OCR) began more 
than 30 years ago, this 



this 



thi 



f Finish y 



f Abort f Don't Learn ^! f Learn ^ 



Unrecognized characters are by default represented by a tilde (the "~" sym- 
bol). The "reject" character can be modified with the "Preferences" command 
under the "Readiris" menu. 



Symbol for rejected characters: 



lira I a rm 1 1 3.n.c ■ 



If necessary, enter a character (or character string) for the incorrect or un- 
known shape and click one of the following buttons. 



Learn 



You agree with the proposed solution or correct it. The program saves this 
doubtful character in the font dictionary as "sure", final. Future recognition will 
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no longer require your intervention, the shape is considered learnt once and for 
all. 

In the example above, the system stops on a damaged character, and we click 
"Learn" to accept a shape which cannot be confused with other characters. 

Don't Learn 

You agree with the proposed solution or correct it. The difference with the 
"Learn" button is that the learnt symbol gets the status "unsure" in the dictionary. 
For future recognition, the system will propose the "'learnt" solution but still re- 
quire a confirmation. 

This button is used for symbols which might be confused with others: a de- 
faced "e" which might be mistaken for a "c", a damaged "t" which closely re- 
sembles an "r" etc. 



Dictionary: Untitled 



characters and typestyles, which are recognized 
automatically later on. This learning module allows you 
to read virtually any font. In other words: the software 
gets more intelligent each time you use it! 
Copyright Image 




f Undo ^ f Delete ^ f Finish ^ 
Abort ^Dqn'tLearn"^ 



The "e" above is seriously damaged - in fact it is close to the letter "c", and you 
should click "Don't Learn" so as not to confuse it with the symbol "c". 

Delete 

The displayed form is eliminated from the output. This button is used to ignore 
"noise" on the documents - spots, coffee stains etc. - which might get recognized 
as points, comma's and what have you -, and to erase any other unwanted sym- 
bol. 

Undo 

You go back to correct mistakes. You can undo the nine last decisions. 

Finish 

The learning process is aborted but the OCR continues in automatic mode. All 
decisions by the system thereafter are accepted without user validation. 

Click this button when you see that the recognition is highly accurate and does 
not require detailled proofreading. 

Abort 

Don't confuse "Finish" with the "Abort" button: with "Abort", no output is 
generated and you start all over, with "Finish", the text is created, it just isn't 
proofread in detail! 

The Role of Font Dictionaries 



The results of each training session are temporarily held in the computer's 
memory but can and should be stored in files called "dictionaries" for future use. 

Font dictionaries should be loaded into memory when you want to recognize 
similar documents in order to make use of the extra intelligence they contain; in 
this way, Readiris takes into account the intelligence stored in these font libraries. 
You could say that Readiris gets more intelligence each time you use it! 
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Initially, all input from the user is simply held in the computer's memory. No 
font shapes are actually saved until he uses the command "Save As" under the 
"Learning" menu. When he does so, all learnt shapes contained in the RAM 
memory are stored in files called "font dictionaries" for future use. 



Save As: Sample Training 



Where: ['..1 Read iris 



f Cancel ^ i Save ^ 



The command "Open Dictionary" allows to load 'font dictionaries back into 
memory. 



Open Dictionary. 



The active dictionary is mentioned at all times in the title bar of the interactive 
learning window! When no dictionary has been saved yet, the name "Untitled 
Training" is used. Click the "Abort" button of the interactive learning in case you 
have loaded the wrong font dictionary! 



Dictionary: Untitled Training 



Use the command "New Dictionary" to "unload" whichever dictionary is loaded 
into memory. 



New Dictionary 



You can also append, complete existing dictionaries by loading them, perform- 
ing extra learning and saving them again. (There's a specific command to allow 
you to quickly save the current dictionary!) 



Save "Sample Training" 



Save Dictionary As... 



Font dictionaries are limited to 500 shapes, and you are recommended to cre- 
ate separate dictionaries for specific applications, for instance per type of docu- 
ment. For clarity, you are recommended to give meaningful names to the font 



dictionaries, for instance Report, Palatino etc. Training no longer has effect when 
the dictionary is full: the results of the learning are no longer held in memory or 
written to a dictionary. 

Saving the Results in a Text File 



The interactive training concludes the character recognition; you will be prompted 
to save the OCR result to a text file. Just click "Save" for the time being. 



Save As: English.txt 
Where: [ |j Desktop 

| | Append to File 



Cancel ^ Save ^ 



Click the "Format" button on the main toolbar (or select the command "Output 
Format" under the "Settings" menu) to discover the versatile output capabilities of 
Readiris. 
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Output Format 



Format: ' RTF 




Layout 

C Create body text 

O Retain word and paragraph formatting 

Q Recreate source document f Fonts. .7 ^ 
se columns instead of frames 

PDF 

Include page image 

._ Create bookmarks 

0 Merge lines into paragraphs 
0 Include graphics 

Output 

0 Ask file name and location 

Send to: ' None T 1 

( Cancel " f OK ^ 



Readiris supports the file formats Text (ASCII), RTF ("Rich Text Format"), 
HTML and Adobe Acrobat PDF. The RTF format is used by default. Note that 
the file extension of the selected format is added automatically to the file name. 
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Save As: 
Where: 



English. Oft 



j Desktop 



30 



The option "Ask File Name and Location" determines whether you are prompted 
to save the recognized text at the end of the recognition phase. 



Sending the Result Directly to Your Applica- 
tion 

But we can also send the recognized text directly to our text application - as 
an alternative to saving a text file and simultaneously with it. For instance, if 
Microsoft Word functions as your target application, your wordprocessor will be 
started up automatically at the end of the recognition (if necessary) and the rec- 
ognized text will be inserted inside a new document. 

The "Send to" feature offers a direct OCR link between your scanner and 
your Mac OS applications. Readiris exports recognized documents directly to 
any text-based Mac OS application - wordprocessors such as Microsoft Word, 
spreadsheets such as Microsoft Excel, web browsers such as Apple Safari, ap- 
plication suites such as AppleWorks and standard Mac OS text applications such 
as TextEdit. 

Use the option "Add Application" to "declare" an application as a possible 
output target; all "declared" applications remain so until they are removed again 
with the option "Remove Application". Select "None" to disable the use of a tar- 
get application momentarily. 



2-37 



User 's Guide 



Output 



0 Ask file name and location 

Send to: </ None 

Clipboard 




m 

GO 



You are recommended to assign different applications to the various formats, 
so that several applications become available as output targets. To make things 
easier for you, you're prompted to assign target applications to the supported text 
formats the first time you run Readiris. 



Output 



Please choose your preferred document type and output format. 
Document type: 

Output format: 



Text 



RTF 



We'd like you to assign the text formats Readiris supports to specific 
applications. Doing so allows you to launch an associated 
application automatically when the recognition is done. 



Associate type f TEXT i~T~] w ' tn 



None 



Any choices you make here can be modified later on with the 
"Format" button on the toolbar. 



Note that the "Send to" option also allows you to copy the recognized text to 
the clipboard, so there is no strict need to export the result to an application. . . or 
save it to a text file! 

Seeing the Text Result 



Concluding, Readiris offers several methods when it comes to saving the OCR 
result: copying the result to the clipboard, saving the result in a text file, exporting 
the recognized document promptly to a target application and even saving the 
result in a text file and sending the recognized document directly to an applica- 
tion. 

After the OCR, the scanned image is redisplayed with the zoning as created 
to be available for further processing, it stays there until you scan another page. 

You can now open the recognized text with your wordprocessor, text editor, 
import it into your desktop publishing software or any other text-based applica- 
tion. You have indeed converted a paper document into an editable computer file, 
be it up to 40 times faster than manual retyping! Go ahead and compare it with 
the image you have inside your Readiris window. 
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^ TextEdit File Edit Format Window Help 



©00 1 English.txt 



A word about OCR 

The aim of OCR is to automatically enter printed text document in a very 
effective and low coat way. Although the first research and development on 
Optical Character Recognition (OCR) began more than 30 years ago, this 
teclmology is still unknown by most of the people who could use it for their 
document entry applications. Now, you can use this effective tool in your 
office and unburden yourself with the fastidious task of retyping printed text. 
OCR is the most efficient and fastest tool to enter texts into your computer 
automatically. 

The document is read by your scanner. This device acts as the "eye" of 
your computer and sends it the image. At this step, the document image is 
only a meaningless cloud of black points, pixels, on a white background. The 
OCR software has to extract text information from these pixels: it has to 
recognize shapes by assigning characters. The system extensively uses 
linguistic databases when analyzing the context, in this way finding correct 
solutions for difficult cases. The user trains the software on new characters 
and typestyles, which are recognized automatically later on. This learning 
module allows you to read virtually any font. In other words: the software 
gets more intelligent each lime you use it! 

Copyright Image Recognition Integrated Systems 
Web site: http://www.irislink.com 



Recognizing Multiple Pages 



But how do you save the text of additional pages? Or in other words: how do 
you process documents consisting of multiple pages? It's actually very simple: go 
on recognizing pages, but enable the option "Append to File" when you are saving 
to the same file! 
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Save As: | Eingllsh.txt 
Where: 3S Desktop 



— 1 i ^ 



(^Append to File 



f Cancel ( Save ^ 



But there's a more efficient way of recognizing several pages than scanning 
and OCRing them one after the other: processing multipage documents di- 
rectly! 

To scan a document composed of several pages in one operation, enable the 
document feeder of your scanner. Study the Photoshop "plug-in" or Twain driver 
of your scanner to see how this works. Place the pages of your document in the 
automatic document feeder and start the scanning. 

You can also open multiple prescanned images. To load several images, select 
the first image and hold down the Command key as you select additional images. 
To load a continuous range of images, select the first image and hold down the 
Shift key as you select the last image. 
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Open Document 



From: |] Images 



IB 



_i Alphabets.tif 
.i| Autoform.jpg 
1 Bra2ilianjpg 
J] Czeeh.jpg 
j Dutch.jpg 
U English.jpg , 
jj| French.] po 
3 German.jpg 
Greek.jpH 
Itiliart inn 




Co to: 



Add to Favorites 



f Canc€l ") f Open ^ 



And you can open multipage TIFF files. When you do so, a page number is 
added to the "root" of the image file. Open the sample file Multipage.tif to give it 
a try; the various pages are displayed one after the other. 



e o e 



Multipage 1 (Page 1 of 5) 



Acquire 



ScanJet... 

i 



English 




i a, % a ;o ia, ^ ^ ^ 



1 1 text zonfr(s) - 0 graphic zana(s) - 0 table zone(s) 



2000^2388x1 59 7K 3004* 



of ibo ulsal dip . ~ 
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ITMQTB11 
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ftii^tj«jj fcT— n Tjgim,fc> thedipityi ^ _ • a fa hnutfton ■uUn 1tie npil rigbti 
of gmb] *advfOMCC tad blTT> 4<4KniMdti " . udd fragron ■edhsds ilsrachru of Wis 



■ numb 



All images you scan or load into memory are added to the current document 
until you click the command "Close Document" or "New Document" under the 
"File" menu. Closing a document or creating a new one "cleans the slate". Any 
document loaded into memory - containing a single page or multiple pages - is 
erased. 



New Document 



Open Document...* 860 
Close Document 38W 

The page toolbar gives direct access to the various pages of the document. 
To go to a page, click it in the page toolbar. The selected page is highlighted. 

You can also edit multipage documents, mainly to correct scanning errors: you 
can drag pages to the trashcan below to delete them and you can drag-and drop 
them to other locations in the document to reorder them. 

Start the recognition on the sample image Multipage.tif. 

If the interactive learning is enabled, you go through the recognition and learn- 
ing phases page by page. 
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When you click the "Finish" button, all decisions by the system thereafter are 
accepted without user validation. In other words, the interactive learning is aborted 
for a//pages; the OCR for this document continues in automatic mode. 

The recognition result of multipage documents is saved in a single output file: 
you are prompted to specify the filename after the first page and the following 
pages get appended. When the recognition result is sent to a target application, 
multiple pages get created inside a single document. 

Organizing the Text Output 



Saving or exporting the text means more than selecting an output method - 
saving a file, sending the output to a target application or the clipboard, or doing 
both - or defining a filename for the output file. You also select a file format and 
determine the appearance of the recognized text. In short, you have to decide 
where you want to take the text before you launch the execution. 

Some options of the "Format" button allow you to influence the look of the text 
output. 

The text flow of the output document is directly influenced by the option 
"Merge Lines into Paragraphs". 

^ Merge lines into paragraphs 

Keep this option enabled to have Readiris detect the paragraphs: Readiris will 
then apply the normal wordwrap typical of wordprocessors, otherwise, a car- 
riage return is added after each line and hyphenated words remain so! Paragraph 
detection is enabled by default. 

Let's give an example to clear things up. When the first three lines of a col- 
umn are "The new presi-", "dent waved from the balcony." and "His wife had 
joined him.", the paragraph detection gives you the following result: "The new 
president waved from the balcony. His wife had joined him." The hyphenated 
parts of the word "president" were "reglued" and a space was added at the end 
of the first sentence, thus creating naturally flowing text. 
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Had paragraph detection not been enabled, the original layout would have 
been retained, with a carriage return added at the end of each line. 

Setting up Your Scanner 



Let's set your scanner up now. It is assumed that the scanner hardware and 
necessary software are installed correctly on your computer system. 

Actually, it's all very easy: Readiris exploits the Photoshop "plug-in" or Twain 
driver of each scanner to support it. In other words, as soon as there's a Photoshop 
"plug-in" or Twain module available for your scanner model, Readiris supports 
it effortlessly! 

In short, locate your scanner's Photoshop "plug-in" on your hard disk and 
copy it to your system's "Application Support" folder. Next, select your "plug-in" 
under Readiris with the option "Scanner" of the "Preferences" command under 
the "Readiris" menu. 

Preferences 

Scanner 

[ Scan Wise Plugin ... |*vj I ! Invert Image 

Digital camera 

To use a Twain driver, simply select it in the "Preferences" command. 

The option "Invert Image" allows you to generate "inverted" images - this 
option is useful to process full pages with white text on a dark background. (These 
options do not apply to scanners using the Photoshop "plug-in".) 

The selected scanner is mentioned in the main toolbar; the title bar of the 
image window and the filename in the page toolbar indicate which scanner was 
used to acquire the image. (Given our example, page 1 was scanned with Agfa's 
ScanWise "plug-in", and that "plug-in" is still the active scanner.) 
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ScanWise Pljgin (Page 1 of 1} 



to Q t % 1 3D 0, V 

1434x3348«8 4708K 300dpl 



Go to the Readiris "Read Me" file or to chapter 1 of this manual should you 
need further information. 



Scanning Documents 



Now that our scanner is set up, we want to get started scanning documents. 

The scanner's Photoshop "plug-in" or Twain driver is used to set the scanning 
resolution, the page format and orientation, brightness and contrast. (The con- 
trast setting is only available on some scanners.) Which scanning options you 
dispose of depends on your scanner model. Refer to the software documentation 
that accompanies your scanner. 



Replace this string with your window title 



[Image Type] 




(Scar using:) 
(300 x 300) 
(dots per inch) 
(BlackS White) 



• (Text) 
(Photo) 

(Mixed (Color)) 
(Mixed (Grayscale)) 
(Custom) 

(Change Custom Settings . ) 
(Send To:) 



(Send Now!) 



(Scan Again) 



f 



Scan All 



3 
3 
3 



( (Restore Defaults) } 

%&p\ ( (Done) ") 0.0000 0.0000 



(Output Type:) (Image Size) 

(100) (Black* White Bitmap) (O.QKB) 



There are some elements you should be aware of. First of all, pay some atten- 
tion to lineskew. Although the page analysis and recognition are skew-tolerant, it 
may become difficult to zone and OCR a page correctly when the skew is too 
significant. Limited lineskew (less than 0.5°) can be ignored because the OCR 
accuracy does not suffer. 
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The option "Page Deskewing" under the "Options" button (or under the "Set- 
tings" menu) determines whether pages which were scanned at an angle will be 
deskewed, straightened automatically. Limited lineskew gets ignored. This op- 
tion is disabled by default. 



Page Deskewing 



If you forgot to enable this option, use the command "Deskew Page" on the 
image toolbar (or under the "Process" menu) to "straighten" pages that were 
scanned at an angle. 

The deskewing takes a few seconds: the image is analyzed to detect the skew 
angle - if any -, the color or greyscale image and 'its black-and-white version are 
deskewed and the page analysis gets re-executed. 

You may also need to adjust the page orientation. Use the rotation tools on 
the image toolbar. (Corresponding commands are found under the "Process" menu.) 
Three rotation directions are available: to the right, to the left and upside down. 
Rotation also takes a few seconds as the image itself is updated, not just the 
display on-screen. 



■MH^ Rotate Left 15 



Left 
ISO" 



Rotate ISO" 



However, Readiris can correct badly oriented pages for you. Enable the op- 
tion "Page Orientation Detection" under the "Options" button (or under the "Set- 
tings" menu) and Readiris will correct the page orientation where needed. 



Page Orientation Detection 



You can make good use of the image Deskew.jpg the image folder if you want 
to try it. Enable the options "Page Deskewing" and "Page Orientation Detection" 
before you open the image and let Readiris restore the Tower of Pisa the way we 
like it. 
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Deskew (Page 1 of 11 




Bring Color to Your Text Scans! 



Readiris supports black-and-white, greyscale and color images on an equal 
basis, so you are free to choose the color mode that best suits your needs. To 
include lineart graphics in the recognized documents, scan in black-and-white, to 
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include black-and-white photos, scan in greyscales, to include color pictures, scan 
in color. 

Readiris processes "true color" images (16 million colors) by default, but you 
can process smaller images to limit the system requirements. It takes the "Pref- 
erences" command under the "Readiris" menu to process 16 bit palette images 
(65,536 colors), 8 bit images (256 colors or greyscales) or 1 bit images (black- 
and-white). 

□ Digital camera Black-and- White 

Grayscale 

Preprocessing 1 

(Vj Reduce colors to: 

tmnnthflr nranrsl 

It goes without saying that greyscale and color images are slower to acquire 
and require more RAM memory than "bilevel" images ! When you increase the 
color mode to true color, the required free RAM memory increases from 22 MB 
to 32 MB on Mac OS 9.x systems! (This does not apply to computers that run 
Mac OS X - that operating system handles memory management entirely au- 
tonomously! 

Note that the image size and bit depth is mentioned on the status bar of the 
image window. 

1872x1939x32 14209K 30CH*>i 

Readiris creates a black-and-white version for every greyscale and color im- 
age. To view a scanned image in black-and-white, disable the option "Image in 
Color" under the "View" menu. 
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Different Devices, Different Resolution 



Whatever your scanning mode may be, maintain a scanning resolution of 
300 dpi. In all probability this is notthe default setting of your Photoshop "plug- 
in" or Twain driver! Select a resolution of 300 dpi for normal applications, use a 
higher resolution of 400 dpi for small print (below 10 point) and when the docu- 
ment is very degraded. 

Readiris reads point sizes of 6 to 72 point (0.08 to 1" or 0.21 to 2.54 cm). 

6 point 

72 point 

Readiris also recognizes "drop letters", large caps that cover several lines. 
(These can of course be no bigger than 72 point!) 

Readiris reads drop 
letters (also called 
"drop" caps) that 
cover several lines and 
assigns them to their starting 
line. 

Faxes have a resolution of 100 or 200 dpi, when you're creating images with 
a digital camera, the resolution is unknown, when you're opening images, the file 
header may contain an incorrect resolution. To process such images hassle-free, 
enable the option "Process as 300 dpi" under the "Preferences" command of the 
"Readiris" menu. This setting applies to both direct scanning and the opening of 
prescanned images. 

1^ Process as 100 dpi 



When your images are acquired by a digital camera instead of a scanner, it 
is mandatory that you enable another special option, "Digital Camera", in the 
"Preferences" command. This parameter again applies to direct scanning and 
prescanned images. 

Preferences 

Scanner 

' ScanJet pM Q Invert Image 

(V| Digital camera 

By doing this, you enhance the image before it gets recognized. There are 
specific challenges to be met when it comes to digital cameras: they produce 
low-resolution images - even when you hold the camera very close over your 
document - and the image resolution is in any case unkown. 

There are some "finer points" to be aware of when it comes to successfully 
recognizing images captured with a digital camera. 

First of all, select the highest possible image resolution. Create for instance 
2,048 x 1,536 size images when 1,024 x 768 and 640 x 480 images are also 
supported. Secondly, enable the "macro" mode of your camera to take closeups 
- which is always the case when you photograph documents. (This mode was 
designed to capture flowers, insects etc.) Otherwise, the images are unsharp and 
illegible. 



2-53 



User 's Guide 





olice seek cybi 



Cyberfraud and electronic 
extortion is becoming a 
real problem for UK 
iusinesses, and the law 



Limit yourself to no or small compression: important compression reduces the 
sharpness of the captured text. Zoom manually to crop your document - some 
cameras are bundled with photo stitching software, but don't bother using it for 
document capture. 

Hold the camera directly above the document to avoid capturing the docu- 
ment at an angle. However, avoid shadows cast on the document by the camera 
or your hand! Produce stable images. Consider mounting your camera on a tripod 
when necessary. 

Disable the flash when you're filming glossy paper, otherwise the image may 
be too light. Generally speaking, adapt the brightness and contrast to the environ- 
ment - day light, lamp light, neon light etc. (Some cameras can be calibrated by 
filming a white document.) 



i£EE 139- 



To give it a try, open the image Digital.jpg in the Readiris image folder and 
execute the recognition. 



Digital (Page 1 of 1) 



a a it a. 



0 text zone(s) - 0 graphic zafie(i) - 0 table zofi«(s) 



3070x1270x112 7620K 300dpi 
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offers many Iree services Lo international delegates ai>d U.S. espwlere, Including daily somlrian: 
eiperlly presented by DOC specialists and inlcmnllonnl trade consulting.. 
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Adjusting the Scanned Images 



Scanning in greyscale and color isn't just useful to save the graphics with 
sufficient quality, in some instances, it's also useful or necessary to obtain good 
OCR results! When text is printed on a color background, scanning in color may 
create the tone differences that are lacking in black-and-white images. When 
there is only limited contrast between the text and the background, the back- 
ground can create "noise" that renders the recognition difficult or impossible! 

Think for instance of black text printed on a dark background: when you scan 
such a document in black-and-white, you may not be able to "'drop" the back- 
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ground color without losing the text information as well, as much as you may try 
to adjust the scanner brightness... 



MASAYOSHi SON, 42, president and CEO, 
is the master Net empire builder. His con- 
glomerate holds stakes in 300 Internet 
companies in the U.S., Japan, Europe, and 
other Asian countries. Today, Softbank 
manages about $4 billion in venture capital 
funds for global investments. 

YASJMITSJ SHIGETA, 35, has invested in 
more than 70 Web or mobile Net-based ven- 
tures in Japan and the U.S.. including Tum- 
bleweed Communications and Phone.com. 
Shigeta is also developing new businesses 
that take advantage of the growth of the 
Internet and mobile communications. 
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As was already indicated, powerful intelligent routines automatically convert 
color and greyscale images into black-and-white. Thanks to its intelligent rou- 
tines, even tough cases get solved - here's how our "difficult" image gets binarized 
by Readiris! 

MASAYOSHI SON, 42, president and CEO, 
is the master Net empire builder. His con- 
glomerate holds stakes in 300 Internet 
companies in the U.S., Japan, Europe, and 
other Asian countries. Today, Softbank 
manages about $4 billion in venture capital 
funds for global investments. 

YASUMITSU SHIGETA, 35, has invested in 
more than 70 Web or mobile Net-based ven- 
tures in Japan and the U.S., including Tum- 
bleweed Communications and Phone.com. 
Shigeta is also developing new businesses 
that take advantage of the growth of the 
Internet and mobile communications. 
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Should this still be necessary, the user can optimize the image further for the 
consecutive OCR process. Select the "Adjust Image" button on the image toolbar 
(or the command "Adjust Image" under the "Process" menu) to do so. 



When you access this command, the black-and-white version is displayed 
automatically. (It's as if you disabled the option "Image in Color"!) There are 
some complicated concepts here, and we need to discuss them in detail. 



Adjust Image 

0 Smoother] grayscale arid color images 

Brightness 

© Automatic 

C Manual 

lighten 

Despeckle 

Q 

off 

Warning: removing too large dots may erase useful 
information from the image 

( Apply ^ ^Cancel) f OK > 





126 darken 



The option "Smoothen Greyscale and Color Images" renders greyscale and 
color images more homogeneous by "flattening", smoothing out relative differ- 
ences in intensity. As a result, a sharper contrast is created between the fore- 
ground - the text - and the background - a color, artwork etc. 
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The image smoothening is also available as an option in the "Preferences" 
command under the "Readiris" menu. We suggest that you leave this option en- 
abled at all times. 

Preprocessing 

!Z! Reduce colors to: 65,536 Colors 
(Vj Smoother! grayscale and color images 
! ' Process as 300 dpi 

The brightness now. By "brightness", we actually mean the black-and-white 
threshold. The setting "Automatic" determines the bilevel threshold automatically. 
Apply a different threshold when necessary by darkening or lightening the black- 
and-white image: when you darken the image, more pixels become black in the 
black-and-white version, when you lighten the image, less pixels become black in 
the black-and-white version. 

Note above all that no image adjustment is executed until you click the "Ap- 
ply" button! By clicking "OK", you execute the adjustment andcXosc the window. 
Here's an example where we lightened the black-and-white image dramatically - 
though admittedly not with OCR accuracy in mind! 
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0 Smoother grayscale and color images 
Brightness 



no 
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@ Manual 



lighten 



S3 



darker 



Despeckle - 
^ 



off 

Warning: removing too large dots may erase useful 
information from the image 
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The first two options concern color and greyscale images, the last one, 
"Despeckle", exclusively concerns black-and-white images. "Despeckling" means 
that the "parasite pixels" (also called "salt and pepper noise") will be removed 
from black-and-white images. 
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If computers can't If computers can't 

adapt easily, then adapt easily, then 

maybe the people maybe the people 

using them can. using them can. 

Be sure that you don't erase spots that are too big, otherwise you might start 
erasing the dots on "i" etc., portions of dot matrix letters etc.! 



Despeckle ■ 



u 

remove 5 pixel dots 

Warning: removing too large dots may erase useful 
information from the image 



( Apply ) 



( Cancel ) f- OK ^ 



By enabling the option "Despeckling" under the "Options" button (and under 
the "Settings" menu) the despeckling is executed automatically on every page 
loaded into memory! 



Page Despeckling ... 



The best way of optimizing the images for the OCR process is this: place the 
adjustment window where it doesn't prevent you from judging the image adjust- 
ment you execute. Adapt the parameters - clicking "Apply" each time - until the 
image is crisp and clear. 
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Saving Default Settings 



Set the program parameters correctly and click the command "Save As De- 
fault" under the "Settings" menu to save the current settings, including your scan- 
ner model, as default settings for future use. 

When you quit the Readiris software and the settings were modified, you are 
invited to save the current settings as default settings. 




Are you ready to modify the default settings? 



^ Mo } f Yes ) 



Settings files contain more than the scanner model: they also determine whether 
you are going to use interactive learning, which language and font type - for 
instance a normal, proportional font - the documents have, which output mode is 
used - for instance send HTML texts to Internet Explorer - etc. In short, all 
operational settings of Readiris are stored in the settings files. 

Saving Specific Settings 



The default settings will obviously be used at each program startup. To restore 
the default settings without having to quit the Readiris software, use the com- 
mand "Open Default" under the "Settings" menu. 




You can also save specific settings to avoid having to redefine the operational 
parameters. The commands "Save As" and "Open" under the "Settings" menu 
take care of this. 
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Open Default 

Save 

Save As.., 
Save As Default 



Let's give an example: if you regularly have to OCR German documents, you 
are recommended to create a settings file for this type of document. You would 
then select "German" as the document language, disable learning because the 
same typefaces are used systematically etc. 



Recognizing Pages Automatically 



Now that our scanner is set up, we want to get started capturing documents. 
Instead of going through all the parameters, we'll execute automatic OCR, a 
very comfortable way of recognizing pages. 

Click the "Auto" button (or select the command "Automatic OCR" under the 
"Process" menu). 



Automatic OCR N 


36T 1 







We will now perform fully automatic OCR, that is we will recognize a page 
immediately, without any interruption. Automatic OCR means that a page is suc- 
cessively scanned, windowed by page analysis or a zoning template and recog- 
nized without interactive learning. All you have to do is initiate the scanning and 
save the recognized text, the intermediate steps are handled by Readiris. 



Readiris Recreates Your Document Layout 



Automatic recognition, which renders the recognition process automatic, should 
notbe confused with autoformatting! "Autoformatting" means that Readiris rec- 



reates a facsimile copy of the scanned document: the word, paragraph and 
page formatting of your original document are applied. 

Similar typefaces (serif and sans serif, proportional and fixed, normal and 
condensed) are used as in the source document, the point sizes and typestyles 
(bold, italic and underlined) are maintained across the recognition. The tabs and 
the alignment (left, centered, right and justified) of each text block are recreated. 
The placement of columns, text blocks and graphics follows your original docu- 
ment. 

In other words, Readiris allows you to archive a true copy of your documents, 
be it an editable and compact text file instead of a scanned image! 

All this implies that the sorting of windows only partially applies when 
"autoformatting" is used: you can include and exclude zones, but any re-ordering 
of zones is simply ignored! 

Here's an example of how it works. To get acquainted with this feature, open 
the imageAutoform.jpg which is found in the image folder. 
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Click the "Format" button, select the text format RTF ("Rich Text Format") 
and the layout option "Recreate Source Document". (The option "Merge Lines 



into Paragraphs" is enabled by default.) Enable the option "Ask File Name and 
Location" to send the reading result to an RTF file or, if Microsoft Word is in- 
stalled on your computer, send the OCR result to Microsoft Word. 

Note that layout reconstruction is limited to the RTF format - and indirectly to 
target applications that support the RTF format adequately. A "poor" format gen- 
erating "plain" text such as Text (ASCII) does not support advanced formatting 
codes and therefore cannot offer autoformatting. On the plus side, the RTF for- 
mat is a widely used text format that can be opened by any popular wordprocessor. 

When the recognized text is opened using a wordprocessor, the text looks like 
this without am intervention by the user. 
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To see the effect correctly, you need to enable the "WYSIWIG" mode of 
your wordprocessor, mostly called "page layout" mode. However, if you send the 
recognized document directly to Microsoft Word, the page or print layout view is 
activated automatically! 
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Normal 
Online Layout 



Outline 
Master Document 



In short, Readiris not only recognizes your texts, but can format them for you 
as well. OCR isn't just text recognition anymore, it is becoming more and more 
page or document recognition as well! 



Columns Please, Not Frames! 



The formatting option "Use Columns instead of Frames" determines Aowthe 
"autoformatting" gets done: the text blocks, tables and graphics can either be 
stored in frames or in editable columns. 

"Frames" are separate containers for text used to position several blocks of 
text, graphics and tables on a page. With columns, the text flows naturally from 
one column to the next, and columnized texts are much easier to edit. 

We now assume that real columns do occur on the scanned document: when 
the system is unable to detect columns in the source document, this formatting 
mode uses frames anyway as a "fallback" position! 

You can make good use of the image Columns.tif in the image folder if you 
want to try it. 
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Furthermore, the button "Fonts" offers you control over the typefaces that 
get used to "autoformat" the document, but we recommended you not to change 
the default values! 
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Fonts 

Select jp to four fonts to be used in the documents 
created by Readiris: 



Font 1 (sans-serif): 
Font 2 (serif); 
Font 3 (fixed): 
Font 4 (narrow): 



Arial 



5 



mes New Roman 



Courier New 



Arial Marrow 



Warning: it is not recommended to change the 
default fonts for Latin languages 



( Default ") 



' Cancel ) OK ^ 



Text Formatting, Part 2 



The other layout options are "Create Body Text" and "Retain Word and Para- 
graph Formatting" . 
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Output Format 
Format: [ RTF (g| 

Layout 

0 Create body text 

O Retain word and paragraph formatting 

O Recreate source document ^ Fonts. .7) 
^ Use columns instead of frames 

Creating body text means you create a non-formatted, "running" text. The 
text will be captured, but its formatting is entirely ignored. Use this option when 
you just need to recapture a text but not its layout. 

The option "Retain Word and Paragraph Formatting" represents the middle 
road: the word formatting - font type (serif - sans serif, proportional - fixed, 
normal - condensed), point size and typestyle (bold, italic and underlined) - is 
retained across the recognition, and so is the paragraph formatting - the tabs 
and the alignment (left, centered, right and justified). 

Don't confuse this formatting option with "full" autoformatting: this option just 
puts one paragraph after the other, it does not recreate columns or copy the 
relative position of the various zones. 

Creating Portable Documents 



We still need to go deeper into one format: Adobe Acrobat PDF. Readiris 
allows you to create PDF documents and offers lots of options concerning PDF 
files. 



Output Format 



Format: PDF 
Layout - 



Create body text 

Retain word and paragraph formatting 

• Recreate source document ( Fonts ..7} 
Use columns instead of frames 



PDF 

0 Include page image 

0 Create bookmarks 

^ Merge lines into paragraphs 

0 Include graphics 

Output 



0 Ask file name and location 

Send to: [ Adobe Reader 6.0 ~ J 1 

r Cancel " f OK ^ 



As soon as the PDF format is selected, autoformatting applies (and cannot be 
disabled). 
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Enabling and disabling the option "Include Page Image" allows to create PDF 
files of two types: when this option is disabled - as is the case by default - Readiris 
creates a PDF file that contains the text result. (Graphics may occur but only 
when graphic zones occur on the page - photographs, artwork etc.) In other 
words: the page image is /jo/xontained in the single-layered PDF file! 
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Autoformatting 

The aim of "autoformatting" is to recreate a facsimile copy of 
the original document. 
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As a result,, you get a true copy of your source 
decurnent, he it a compact and editable text 
file, no longer a scanned image of your 
document! 
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When this option is enabled, you get different results: Readiris creates a search- 
able PDF file that contains the recognized text and the page image. The page 
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image is contained above the text in a two-layered PDF file. Use the "Search" 
tool of Adobe Reader or Adobe Acrobat and this becomes quickly obvious ! 



Search PDF 



' Hide ) 



Finished searching for: 
OCR 

Total instances fo und: 



Results: 



( Mew Search ^ 
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j> Complete Adobe Reader 6,0 Help 
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Autoformatting 

The aim of "autoformatling'* is to recreate a facsimile copy of 
the original document. 
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The option "Create Bookmarks" sees to it that a bookmark is created for 
each document element - the graphics as well as the text blocks and tables. (For 
the text zones, Readiris applies an intelligent algorithm to come up with a title, a 
"summary" per zone; the tables and graphics are simply numbered.) (Another 
navigational element of PDF documents, page thumbnails, can be created dy- 
namically by your Adobe Reader or Adobe Acrobat software!) 
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... Or Reading Them 



Let's look the other way for a moment. As Readiris offers full support of the 
Adobe Acrobat PDF format, you won't just generate PDF files, you can also 
reaJthem! 

"Repurposing" PDF documents may be a major application of Readiris. 
There are several reason why this is the case. First of all, it's a way of converting 
images into text: open image-based PDF documents, execute the recognition and 
save the OCR result to a text document (in any supported text format). Text files 
are editable, image files are not. 

Second case: you can convert image-based PDF files to text-based PDF docu- 
ments. You then execute the recognition on "image-only" PDF files and save the 
OCR results... as text-based PDF documents! Text-based PDF files are search- 
able and editable, "image-only" PDF files are not. 

Finally, converting PDF files is a way of "unlocking" PDF content. You can 
recognize "read-only" PDF documents, where the text is normally inaccessible. 
With unprotected PDF files, the content can be retrieved (copied and saved to an 
RTF file), with "read-only" files, the content cannot be extracted. These docu- 
ments can only be viewed and printed! 

An important nuance: Readiris does not open password-protected PDF docu- 
ments, even if all other PDF security barriers are broken down by Readiris! 

Proceed as usual: load PDF files into memory as you open prescanned images 
- faxes, snapshots made with your digital camera etc. (You can give it a try with 
the file Sample.pdf in the Readiris image folder if you care to...) 
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Saving Graphics Separately 



In our PDF example, the graphic was included in the recognized text; whether 
this is the case depends on the formatting option "Include Graphics" . Saving graph- 
ics inside the text is only possible with "full" autoformatting, not with a "poor" 
text format such as Text (ASCII). 
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Include graphics 

Still, with Readiris, you can save graphics without perforating text recognition. 
As Readiris supports black-and-white, greyscale and color images, you can cap- 
ture lineart graphics and photographs. 

How? Draw a graphic zone around the illustrations, cartoons etc. you need. 
Creating graphic windows manually is done in the same way as drawing text and 
table windows, simply select the graphic window tool now on the image toolbar 
(or under the "Layout" menu). 



Draw Graphic Zones 



Similar to the other window types, the status bar of the image window tells 
you how many graphic zones there are. 

Next, choose the command "Save Page As" under the "File" menu and enable 
the option "Graphics Only". You are prompted to specify a filename. 



Save A;: English 



Format: TIFF 
Where: H Desktop 



O All © Graphics On!y 



30 



' Cancel ^ ^ Save ^| 



Determine which graphic file format you will use. Select a format that's sup- 
ported by your paint or photo retouching software. A multitude of popular graphic 
formats is available: JPEG, Photoshop, PNG, PICT, TIFF and Windows bitmaps 
(BMP). 



The graphics are saved in a single file. You don't have to limit yourself to a 
single graphic, but if you draw several graphic windows, they will be collected, 
"stacked" in a single file. (You can use the crop command of your paint or photo 
retouching program to separate them.) 

Sides smaller than 1 mm are not allowed - bitmaps of that size hardly contain 
any information. "Irregular", non-rectangular windows are allowed, and so are 
several graphics. The surface not covered by your "complex" graphic zones 
remains white. In the example below, two graphics zones - one in the left lower 
corner and the other in the upper right corner - lead to lots of white space around 
the actual graphics. 



Reading Faxes and Deferred Recognition 



Saving images as image files opens another possibility: you can save the Rill 
page and perform deferred OCR on it later on. That's what we did with the 
prescanned images of our tutorials. 

Simply scan a document and select the command "Save Page As" under the 
"File" menu. (This command only saves single pages.) You'll be prompted to save 
the entire page as a graphic file when you enable the option "All". (Any windows 
you might have detected or drawn on the page are ignored.) 
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Save A;: English 
Format: 



V TIFF 



©All 



Whereij EMP 
JPEG 

Photoshop 
PICT 
PNG 



0 



^ Cancel ^ f Save ^ 



The color mode of the original image - color, greyscale or black-and-white - is 
always maintained. 

Select an appropriate graphic format - various graphic formats are available. 
When you save a document as a JPEG file for deferred OCR, ensure that you 
maintain sufficient image quality. JPEG files with high compression rates de- 
grade the image quality - and the performance of your OCR software can suffer 
as a consequence. 

As we just indicated, the command "Save Page As" exclusively saves the 
currentpage. There's a much more efficient way of saving your scans in graphic 
files for later OCR: enable the image scanning mode. 

To do so, select the document type "Image" on the main toolbar (or under the 
"Settings" menu). Note that the "Recognize" button is now replaced by the "Send" 
button! 



Image 



Send 



Click the "Format" button to discover what this means. You have the same 
flexibility that you have when you're recognizing documents: you can save your 



scans in files and send them directly to a target application - Photoshop, the 
Preview application etc. (Note how the "Format" button indicates the selected 
graphic format!) 



TIFF 



Image 



Format: 
Output 



TIFF 



0 Ask file name and location 



Send to: Preview 



Cancel f OK > 



Clicking the "Send" button exports all scans of the current document. 













Image_l.tif lmage_2.tlF 



Obviously, you can load the image files into memory with the "Open" button on 
the main toolbar (or with the corresponding command under the "File" menu). Or 
double-click the icon of a Readiris image to load it into Readiris. (You can even 
select several of Readiris' image files and execute a double-click to load them 
into memory simultaneoulsy...) 

Color, greyscale and black-and-white images are supported on an equal basis: 
Readiris allows you to open FlashPix images, GIF images, JPEG images, MacPaint 
images, Photoshop images, PICT images, PNG images, QuickDraw GX images, 
QuickTime images, Silicon Graphics images, Targa images, (uncompressed, 
packbits and Group 3 compressed) TIFF images, multipage TIFF images and 
Windows bitmaps (BMP). (Readiris also opens Adobe Acrobat PDF documents.) 

This capability is particularly useful to convert your faxes into editable text 
files ! If you have any influence over your correspondents, ask them to send faxes 
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with the "fine" quality - those faxes have the higher resolution of 200 dpi and will 
yield better OCR results. 

Recognizing Tables 



So far, we've recognized texts and faxes and we've saved graphics. Let's 
process a table now. Take a table of figures and scan it, or open the sample image 
Tables.jpg in the image folder. 

Actually, the image Tables.jpg contains two tables, and that's no coincidence! 
The page analysis zones them as table windows, and Readiris will reconstruct 
them for you by recreating the tables cell by cell in your spreadsheet or by insert- 
ing a table object inside your wordprocessor files. 
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Finally, you can send your tables of figures directly to Microsoft fcKcel by selecting lKc spreudHheei 
as. target application - refer to the "Format'' button t>n the mam mdbar. 



Run the recognition with the layout option "Retain Word and Paragraph For- 
matting" or "Recreate Source Document" enabled and the tables get recreated. 
Open your wordprocessor to have a look at the result. 
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Reading Tables 
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°i i ft i i"i i ft i ai'i 



Read-ris recogwes t&iul&r iata sM. recreate; them cell Iv cell worksheets or as 
taile objects inside wordpriocessor files. 
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l tie word aiii p aragraph formatting or r> 



tie "Format" "but ton on the ma in too lb ar 

The page a ie lysis detects "giadded" and "ungiddded" talks. "GridAed" or "franwd" talks lisre 
hordens arou nl tic eelLr - as iocs tic example bdow.Tlie 1 orders of tie Cable cells get recreated.. 
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"Ungridd^d" tatkf don't have any torders arou nd tie cells. Wkn ths column; of u ngridde d table; 
are too widely Jjaeedj the page aialyjir may not detect a talk wiadow to awi! confufic-n with 
columnied text 1 tides. 



Wlien your talk s flralusivBlly 
t]w "Language" tut ton. on the o 
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oolbar tw iMreased accuracy. 



lis numeric reading mode with 
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ippKeation- refer Co tha "Format" luttononthe maintoolb 
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Page 1 
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Have a closer look at the "griddecT or "framed" table - the scanned table that 
had borders around the cells. The cells and the borders were recreated by Readiris 
one by one! 
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Let's concentrate on the "ungridded" table for a moment - it has no borders 
around the cells. Note that the page analysis has nevertheless detected it. There's 
another interesting aspect to this table: its content is purely numeric! 

For optimal OCR accuracy of such tables, we can limit the recognition to the 
numeric symbols with the "Language" button. (The numeric mode is not strictly 
numeric, it includes the symbols 0 to 9, +, *, /, %, , (comma), . (dot), (, ), -, =, $, £ 
and the € symbol.) 



Numeric 











Dutc 
Engl 
Fren 


h 

sh 

ch 




uerman 
Italian 

Spanish 

Other... 



As you can only do this when the table doesn't contain any alphabetic symbols 
- otherwise the text portions won't be recognized correctly - we can activate the 
numeric mode only when we recognize this table but not the rest of the docu- 
ment. 

When we do so by selecting this table with the "Sort" button, we can send the 
OCR result directly to the spreadsheet Microsoft Excel. Select HTML as text 
format and Excel as target application with the "Format" button. 
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Output Format 



Format: ' HTML jj| 
Layout - 



O Create body text 



© Retain word and paragraph formatting 

O Recreate source document (_ Fonts. .7 ^ 
^ Use columns instead of frames 

PDF 

Include page image 

_ Create bookmarks 

0 Merge lines into paragraphs 
0 Include graphics 

Output 

0 Ask file name and location 

Send to: { Micros oft Excel i 1 

( Cancel " ( OK ^ 



The spreadsheet is started up and the typical table structure with rows and 
columns gets recreated; you are immediately ready to process the data. 
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You may come across "ungridded" tables the page analysis does not detect as 
table zones because the columns are too widely spaced - Readiris tries to avoid 
confusion with columnized text blocks. To create a table window manually click 
on the table window tool in the image toolbar and proceed as usual. 



1 1 text 2one(s) - 1 grajjtiic 2one(s) - 1 table zare(a) 



Getting On-line Help 



This concludes our overview of Readiris. Some last-minute information may 
not be included in this manual. We thus recommend you to consult the on-line 
help system for additional information on Readiris. 

Go to the "Help" menu to do so. The command "Readiris Help" allows you to 
navigate through the many help topics. 
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(t^l Live Home Page 0 Apple @ Apple Support ® Apple Store .Mac ® Mac OS X ® Microsoft MacTopia 




H Index fp Search 



[?] Welcome to the Read iris help 

^ Introducing OCR 

Reto L;n izin l; Documents 
Ki.'/iL rii. ■'Mil: Business Cards 
S^unninjj Images 

(Q^ How to...' 1 

Reference Information 

S.n :i rv V-j rski n * :ind Oplkins 

^jlt Product El j; islration 
Product Support 

Oi t.R.t.S. 




Welcome to Readiris™ Help... 

• Use on-line help to learn more about Readiris, 

• Quickly find answers to questions. 

• Connect to the I.R.I.S. web site for latest tips and product updates. 
©2003 Copyright I.R.I.S. All rights reserved 



I Local machine zone 



You can also find more information on Readiris on the I.R.I.S. web site 
(www.irislink.com); the command "I.R.I.S. on the Internet" takes you directly to 
the I.R.I.S. home page. 



I.R.I.S. on the Internet 
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